Elemental confounds
Four elemental confounds
True confounds.
Not stratifying on (controlling for) Z will yield a spurious relationship between X and Y. That is, a correlation of X and Y (or regression) will be non-zero, even though there is no causal relationship from one to another.
Create two plots, one showing the relationship between marriage rate and divorce rate and another showing the relationship between median age at marriage and divorce rate.
p1 <- d %>% ggplot(aes(x = Marriage, y = Divorce)) +
geom_point(color = "#1c5253") +
geom_smooth(method = "lm", color = "black") +
labs(x = "marriage rate", y = "divorce rate")
p2 <- d %>% ggplot(aes(x = MedianAgeMarriage, y = Divorce)) +
geom_point(color = "#1c5253") +
geom_smooth(method = "lm", color = "black") +
labs(x = "median age at marriage", y = "divorce rate")
(p1 | p2)Model the relationship between Divorce Rate (D) and Marriage Rate (M). (Standardize both first.) Be sure to do the following:
\[\begin{align*} D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_MM_i \\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]
# compute percentile interval of mean
M_seq <- seq( from=-3 , to=3.2 , length.out=30 )
mu <- link( m3.1 , data=list(M=M_seq) )
mu.mean <- apply( mu , 2, mean )
mu.PI <- apply( mu , 2 , PI )
# plot it all
plot( D ~ M , data=d , col= "#1c5253", xlab = "marriage rate", ylab = "divorce rate" )
lines( M_seq , mu.mean , lwd=2 )
shade( mu.PI , M_seq ) mean sd 5.5% 94.5%
a -1.983311e-07 0.1082467 -0.1729993 0.1729989
bM 3.500315e-01 0.1259279 0.1487743 0.5512886
sigma 9.102681e-01 0.0898631 0.7666495 1.0538867
Now we’re going to incorporate state age (median age at marriage) into our model. This is the DAG proposed by RM. What does this DAG represent?
DAG models help us to see conditional independencies.
How does this change with a new DAG?
dag3.3 <- dagitty( "dag{ A -> D; A -> M }" )
coordinates(dag3.3) <- list( x=c(A=0,D=1,M=2) , y=c(A=0,D=1,M=0) )
drawdag( dag3.3 )D _||_ M | A
\[\begin{align*} D_i &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= \alpha + \beta_AA_i + \beta_MM_i\\ \alpha &\sim \text{Normal}(0, 0.2) \\ \beta_A &\sim \text{Normal}(0, 0.5) \\ \beta_M &\sim \text{Normal}(0, 0.5) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]
flist <- alist(
D ~ dnorm( mu, sigma),
mu <- a + bA * A + bM * M,
a ~ dnorm(0, 0.2),
bA ~ dnorm(0, 0.5),
bM ~ dnorm(0, 0.5),
sigma ~ dexp(1)
)
m3.2 <- quap(flist = flist, data = d)
precis(m3.2) mean sd 5.5% 94.5%
a -9.025558e-07 0.09707599 -0.1551471 0.1551453
bA -6.135138e-01 0.15098352 -0.8548146 -0.3722130
bM -6.538065e-02 0.15077299 -0.3063450 0.1755837
sigma 7.851176e-01 0.07784329 0.6607090 0.9095262
If we want to simulate the effect of manipulating marriage, we use “do calculus.” We do this by effectively “deleting” the arrows going into our manipulation variable (M).
post <- extract.samples(m3.2) # get 10k samples of all parameters
n <- 1e3
As <- sample(d$A, size=n, replace=T) #sample from original data
# simulate D for M=0
DM0 <- with( post ,
rnorm(n, a + bM*0 + bA*As, sigma))
# simulate D for M=1 -- SAME A values
DM1 <- with( post ,
rnorm(n, a + bM*1 + bA*As, sigma))
#contrast
M10_con <- DM1 - DM0
dens(M10_con, lwd=4, col = "#1c5253", xlab="effect of 1SD increase in M")A pipe in a DAG model represents a situation where a variable acts as a mediator between two other variables. In this context, the effect of one variable on another is transmitted through the mediator.
One place that pipes show up is in post-treatment bias.
Suppose you are studying plants in a greenhouse, and you want to know how effective a particular fungal treatment is. Fungus on plants tends to reduce their growth. You plant a bunch of plants, measure them, then apply one of two different treatments. After some time, you measure the plants again, and you measure the amount of fungus on the plants.
Simulate some fake plant data.
set.seed(71)
# number of plants
N <- 100
# simulate initial heights
h0 <- rnorm(N,10,2)
# assign treatments and simulate fungus and growth
treatment <- rep( 0:1 , each=N/2 )
fungus <- rbinom( N , size=1 , prob=0.5 - treatment*0.4 )
h1 <- h0 + rnorm(N, 5 - 3*fungus)
# compose a clean data frame
d <- data.frame( h0=h0 , h1=h1 , treatment=treatment , fungus=fungus )
precis(d) mean sd 5.5% 94.5% histogram
h0 9.95978 2.1011623 6.570328 13.07874 ▁▂▂▂▇▃▂▃▁▁▁▁
h1 14.39920 2.6880870 10.618002 17.93369 ▁▁▃▇▇▇▁▁
treatment 0.50000 0.5025189 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▇
fungus 0.23000 0.4229526 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▂
Draw the dag that describes the relationships between these 4 variables.
What are the implied conditional independences?
plant_dag <- dagitty( "dag {
H_0 -> H_1
F -> H_1
T -> F}" )
coordinates( plant_dag ) <- list( x=c(H_0=1.0,T=0,F=0.5,H_1=1),
y=c(H_0=-.5,T=0,F=0.5,H_1=0) )
drawdag( plant_dag )F _||_ H_0
H_0 _||_ T
H_1 _||_ T | F
Let’s start by just modeling growth using our two height variables.
\[\begin{align*} h_{1,i} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= h_{0,i} \times p \\ p &\sim \text{Log Normal}(0, .25) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]
Now add treatment to this model.
\[\begin{align*} h_{1,i} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= h_{0,i} \times p \\ p &= \alpha + \beta_TT_i \\ \alpha &\sim \text{Log Normal}(0, .25) \\ \beta_T &\sim \text{Normal}(0, .5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]
flist <- alist(
h1 ~ dnorm(mu, sigma),
mu <- h0*p,
p <- a + bT * treatment,
a ~ dlnorm(0, .25),
bT ~ dnorm(0, .5),
sigma ~ dexp(1)
)
m3.4 <- quap(flist, d)
precis(m3.4) mean sd 5.5% 94.5%
a 1.38168552 0.02519767 1.34141478 1.4219563
bT 0.08366426 0.03431331 0.02882496 0.1385036
sigma 1.74629961 0.12190865 1.55146604 1.9411332
Now add in both treatment and fungus to this model.
\[\begin{align*} h_{1,i} &\sim \text{Normal}(\mu_i, \sigma) \\ \mu_i &= h_{0,i} \times p \\ p &= \alpha + \beta_TT_i + \beta_FF_i \\ \alpha &\sim \text{Log Normal}(0, .25) \\ \beta_T &\sim \text{Normal}(0, .5) \\ \beta_F &\sim \text{Normal}(0, .5) \\ \sigma &\sim \text{Exponential}(1) \end{align*}\]
flist <- alist(
h1 ~ dnorm(mu, sigma),
mu <- h0*p,
p <- a + bT * treatment + bF * fungus,
a ~ dlnorm(0, .25),
bT ~ dnorm(0, .5),
bF ~ dnorm(0, .5),
sigma ~ dexp(1)
)
m3.4 <- quap(flist, d)
precis(m3.4) mean sd 5.5% 94.5%
a 1.482826896 0.02452192 1.44363614 1.52201765
bT 0.001060225 0.02987668 -0.04668849 0.04880894
bF -0.267912187 0.03654981 -0.32632584 -0.20949853
sigma 1.408656542 0.09859148 1.25108831 1.56622477
Stratifying on Z opens up the association between X and Y. We do not want to stratify on Z.
We’ll use the rethinking package to simulate data: